ECCUMINT BESUftE 



ED 050 181 



TH 000 84b 



A L I A C i A r. yo 11, '< 1 1 1 1 a u. 8. , Id. 

TIiLt *1 h e CcJ.itqe board Adniisions 'lasting i i ogc^.n: A 

Tcchuicdl report on h^-c-ncti ana Eg vt 1c ^ me nt 
Activities delating to tnc Ccholastic Altitude- lest 
and Ac m eve men t 7 e s t > . 



IlvS I J i O r ION 


Educational 


'Its* tiny b’ i vice. 


Pri net 


t on , N . J . 


S TONS AC E N C V 


Collect Ent 


ranee Lxatrinaticn 


boar d , 


t* c 'i* i e 1 K , .. i y • 


PUfc LAI I 


7 1 








NOTE 


I^Up* 








AVAILABLE f ?.CK 


t o 1 1 e q e : : n t 


La DC G - : XdlMliatlCIi 


b o a l d , 


bo x b 1 c. , 




t r ir, co ton , 


New J e r n e y , U 8 b 4 0 


<*b. 00 


O l copy with a 




iO/o disc cur. 


i i oi orders o: u 


Ve OL 


more coniet) 



:DES Vh LCh 

desce 1 1 1 c « 



I D i N 'I I f 1 : i\ b 



} !_■ ^ S Price /ib - 1 C . 0 5 l C Net Available i LO’n .-Erf. 

* Achievement Tests, Adransnioh (School), A cl in l s s ic n 
Criteria, Ccllsyr r< c l 1 n d Stuaenis;, Colit yo 
Carr a culuni , *Colleqt Entrance ExaiMiiat ions , 
m a t ni ui«i t ic5 , riedictior, * I l e d i c 1 1 vo V a i i d 1 1 y , 
bred actor Variables, psychometrics, *!•«. Ktucii 
It views ( P ublica t ion s) , Statistical Ane lysis, Tali f , i 
(Eatci) , j'ti t Construction, ^'iestinq Prograns, 

Tut. ctiiy, Verbal lost. s 

FA I, * Echo! art ic Attitude Tort 



At SIS AC : 



This u [ ci t sms j rep a red ror tnc sjn.cial.ist who, 
well acquainted with tin. concents. ana statist ics ot eduCctionai 
m g a sa r e ;w nt , reeds technical imci joticti acout Hit Auiiissrons r estina 
i icjia iti oj. tu Coileq^; intranc* Ixac.u at i cu i caiu, It trir.es f oof tlf r 
iic i.iiidin.js ci a ccns ;.d< La 1 lc nun per ol studios- oi :h( Sc rule stic 
;■ r t i t uu e 7 l ^ t I E A I ) and t h t AcMcvr in p n I 7 es t . i b r » ;j u 1'lic a t i o ; . s 
predate i tnis loos ana they i;m v» as a conjenuiun cl avciiarlc 
1 ill oi rna 1 1 cn c; tie CclLeyo f Odia Admission* lectins srocram up + c 
IE) 5 V stressing \l oyt.i uu use ana tr* t mte i \ rt ta* icn lor adricsKiu 
and guidance, ^ectht nanuais t* nd tc \ e n.cie inter; ictLV* than 
p x 1 1 1 ca 1 1 vo . Jf is t At a l in ot t L x u i. ooK f ho Wf v»?i, t c i ) .1 i t he 
liitoriiMUoi. qaj by \ t: cvidiii a th* da ta needed ioi a co&i : i;i- n» nsiv* 
t c‘c*n fie.-. J evaluation ot the te:t.s. in oraer to muicatf t be Frttina 
witni. wmen the jicaian tuictichs, m |»m -?1 iiistouc.il sketch, as 
voil an a dr.r ri j s 1 1 a 1 1 v» ai a j sycncn i 1 1 ic cohiUacMtions art Mi‘tly 
outliia (J, Inc sc bo las* ic Altitude i<e i tr- and heu x * v« i .cut jp.sts ar^ 
dlscusned and in t* i f r* 1 1 vc sunr-aiy statistic: toi varicus suVaiouji 

ot studeijts arc |.rc5enn o as an * x tor sive validity stuairs ot the 
admissions tests, ‘jnally, tn* najot linuina: ot si KUi stud ns that 
have teen muac ii ccni.tct icti vitn the SP : nnd ta» Ac!u » ve r: e nt T . : t n 
a to s u .iisia n v e d . f Aut ncir/t K) 




t-H 




OO 




1 — 1 




O 




m 




0 


PERMISSION 10 PEP-TT DUCt THIS CO p r 
RIGHT EU MATERIAL BY MICROFICHE ONLY 


a 


HAS BEEN GRANTED By 


UJ 


UJ , H _Q.Mf-ot-.r_ . . 



TO ERIC *N.O ORGANIZATIONS OPERATING 
IjNDER AGREEMENTS W'MTpE US OrF'Cl 
OP EDUCATION FUHT HER BEPROOUt.TkON 
OUTS-iDE THE ERIC SYbTEW REQUIRES PER 
MISS ION Of rneCGPr»lGMTO*NER 



The College Board Admissions Testing Program 

A technical report on resea rch and developmen t acti vities 
relating to the Scholastic Aptitude Test 
andAchievernen t Tests 



William H. Angoff 
Editor 



m 

to 

o 

o 

o 

COLLKCj K KNTHANCK KX AM (NATION HOAHI), NKW YORK, 1-971 







Copies of this book may be ordered from: 

Pu b J iea t ions Ororr Office 

College Entrance Humiliation Hoard 

Iiox 592, Princeton, New Jersey 03510 

The prr*e is $5 per copy, with a discount of 20 percent 
for orders of fis c or mure copies. 

This book was prepared and produced for tbe College 
Hoard by educational Testing Service. 

Copyright © 1971 by College Krdrar.ee examination 
Hoard. All rights reserved. Library of Congress Catalog 
Number 75-150163. Printed in the: United States of 
America. 



o 

ERIC 



Contents 

Foreword xiii 

Preface xiv 

Chapter L The Admissions Testing Program 

WILLIAM 1 L ANGOFF and KIKNKY S. DYLR 
Historical background 1 

Administrative considerations . . 4 

Nature of l Ire program 4 

Administration of the tests 5 

Influence on curriculum. . 6 

Coaching 7 

Security 7 

PsvJiomelrio considerations 8 

Validity 8 

Reliability 8 

Parallelism of forms 9 

Item analysis 9 

Test analysis 10 

The score scale 10 

The liurms 11 

Scoring methods R 

References. 13 

Chapter II. The Scholastic Aptitude Test 

THOMAS F. DONLON and WILLIAM II. ANC.OFF 

The purpose of the sat 15 

A description of the current test 17 

History and evolution of the sat 19 

Assembly of the sa a - verbal sections 21 

Assembly of the 5AT»nulhcinnlical sections 23 

General comments on the assembly 25 

The pretest program 26 

i i i 

o 

ERIC 



3 



Chapter 11 < coxtisuf.d ) 



Parallelism and reliability of the sat 28 

Validity 28 

Speeded ness 29 

Correlation between SAT-verbal and sAT-mathematical scores 30 

Factor analyses of the sat 32 

Origin and maintenance of tno fat score scale 32 

Current procedures and their results 35 

The 800 -score 40 

Score change 41 

Coaching 45 

Preliminary Scholastic Aptitude Test (Psat) 45 

Arrangements for handicapped students 46 

References 46 



Chapter 111 . The Achievement Texts 

WILLIAM E. COFFMAN 

Introduction 49 

Purpose." 49 

Historical background 50 

Evolution of the Achievement Testing Fiogram 51 

Testing dates 53 

Construction procedures 54 

The committee system 54 

Test planning 55 

Curriculum bias in the Achievement Tests 57 

The quality control system 58 

Scaling and equating the Achievement Tests 62 

Establishing t he initial scale 66 

Equating subsequent forms 70 

Statistical characteristics of the Achievement Tests 71 

References 77 



Chapter IV . Descriptive Statistics on College Hoard Candidate s 
and Other Reference ( : ro'ps 
W. H.SCHHADKK and K. KL1ZAHKTH STEWAKT 



Introduction 79 

Uses of descriptive statistics on test performance 79 

Score interpretation 79 

Group comparisons 79 

Methodological considerations 79 



ir 

O 

ERIC 



4 



Performance on the sat 81 

High school senior and college freshman norms 81 

Local norms 84 

Candidate norms, 1967-68 and 1968-69 87 

Trends in candidate performance. 88 

Performance on Achievement Tests 91 

Nature of Achievement Test candidates 91 

Characteristics of groups taking specific tests. 92 

Candidate norms, 1967-68 and 1968-69 94 

Trends in candidate performance 101 

Summary 109 

References 115 

Chapter V. The Predictive Validity oj College Board Admissions Te>t$ 

\V. B. SCHKADKK 

Introduction 117 

The role of predictive validity in the development and use of 

College Poard tests 117 

Designing conventional regression studies 113 

Choice of criterion 118 

Choice of groups 119 

Choice of predictors 120 

Statistical analysis 121 

Reporting the results 121 

Interpreting the results 121 

Designing validity studies for special purposes 122 

Comparison of alternative predictors 122 

Placement use 123 

Interpreting validity coefficients at> a measure of predictive value 124 

Evidence on the predictive validity of the sat 125 

Results for students in liberal arts and general programs 127 

Results for students hi special ciirriculums or divisions 130 

Validity for foreign students attending college in 1 lie United States 132 

Evidence on the predictive validity of the Achievement Tests 135 

Prediction of freshman average grade 135 

Prediction of grades in courses or subjects 138 

Central prediction: Efforts to improve or facilitate prediction for 
groups of colleges 139 

References 144 

Chapter VI . Special Studies 

JOHN KHKMKIt end MA1UOH1K O. Cl! ANI)I Kit 

Introduction 147 

Effects of coaching 147 

Coaching in independent preparatory schools 148 

Coaching relatively less able students 149 



Chapter VI <costixved) 



Practice and growth 150 

Problems in the explanation of score changes 150 

Summaries of score changes. 150 

Contribution of practice and growth to score changes 151 

Use of retest scores in prediction 151 

Students view their performance 152 

Fatigue and anxiety 152 

Opinions of students and school officials 153 

Effect of fatigue on test scores 153 

EfTects of anxiety on test scores 154 

Appropriateness of a single sat 155 

Consideration of new item types for the sat 156 

New item types foi predict ii«g four-year performance 156 

Validity of new item types for high-aptitude students 157 

Validity of new item types at various ability leveb 158 

New test content 159 

Biographical Inventory 159 

Tests of Developed Ability 160 

Academic Interest Index 161 

M>ers-Briggs Type Indicator 162 

Formulating Hypotheses Test 162 

English essay 163 

Opinions of teachers and students on essay testing 163 

The first objective English Composition Test exercise 164 

The Interlinear exercise 164 

Comparison of six item types 164 

The General Composition Test 165 

The Writing Sample 165 

The new English Composition Test essay 166 

Points of view' in essay grading 166 

Special populations 167 

Sex differences 168 

Cultural differences 170 

sat scores at the extremes of the score range 172 

Curriculum change and diversity 174 

Impact of new physics curriculums 175 

Different cognitive approaches 176 

Achievement in chemistry 176 

Culver Military Academy; A case study 177 

A look ahead 178 

References 178 



ERIC 

6 



vi 



Figures and Tables 



Chapter II 



Figure 2.1 Sample sat instructions 18 

Figure 2.2 Genealogical chart for SAT-verbal 34 

Figure 2.3 Genealogical chart for SAT-mathematical 36 

Figure 2.4 Illustration of highly consistent equating results 33 

Figure 2.5 Illustration of marginally acceptable equating results 39 

Table 2.1 Data relating to speededness: SAT-verbal sections 20 

Table 2.2 Data relating to speededness: SAT-mathematical sections 20 

Table 2.3 Item types used in s vr-verbal and SAT-mathematical sections, 

1926-69 21 

Table 2.4 Classification scheme for SAT-verbal items 22 

Table 2,5 Distributions of SAT-verbal items by difficulty level 24 

Table 2.6 Distributions of SAT-mathematical items by difficulty level. ... 25 

Table 2.7 Distribution of equated deltas for SAT-verbal and sat- 

niathematical items at time of protest: 1965-66 27 

Table 2.8 Internabconsisteacy reliability estimates and standard errors 

of measurement for 12 recent sat forms 28 

Table 2.9 Estimates of parallel-form reliability for sat scores 29 

Table 2.10 Scaled score equivalents to selected raw scores for eight 

sat forms- December 1967 to May 1969 29 

Table 2.11 Percent completing the test for 12 sat forms — December 

1966 to May 1939 30 

Table 2.12 Percent completing three quarters of the test for 12 sat 

forms— December 1966 to May 1969 30 

Table 2.13 Mean and standard deviation of the number not reached for 

12 sat forms— December 1966 to May 1969 31 

Tabic 2.14 Correlation between SAT-verbal and sAT-niatliematical scores 

for 12 forms— December 1966 to May 1969 31 

Table 2,15 Trend in the correlation between SAT-verbal scores and 

SAT-mathematical scores 31 

Table 2. 16 Maximum scaled scores obtained with linear equating for 

eight sat forms— 196 FGG 40 

Table 2.17 Specified distributions of item difficulties and mean biserials 

for the sat 41 

Table f '.18 Maximum scried scores obtained with linear equating for four 

sat forms — December 1966 to May 1967 41 

Table 2.19 Maximum scaled scores obtained with linear equating for 

eight sat forms -December 1967 to May 1969 42 



o 

ERIC 



7 



Chapter II (costisved) 



Table 2.20 Seven-year summary of change in sAT-verbaJ scores from 
March or May of the junior year to December or January 
of the senior year 43 

Table 2.21 Seven-year summary of change in SAT-mathematical scores 

from March or May of the junior year to December or January 
of the senior year 44 

Chapter III 

Table 3.1 Example of a lest outline for an Achievement Test 

(Social Studies) 56 

Table 3.2 Examples of 'Pern analysis computer output 60 

Table 3.3 Example of score distributions and conversion data reported 

in test analyses of Achievement Tests (Physics Test, May 1963) 63 

Table 3.4 Example of data on reliability, intercorrelations of sections, 
and speededness of sections reported in test analyses of 
Achievement Tests (Physics Test, May 1933) 64 

Table 3.5 Example of detailed data on speededness reported hi test 
analyses of Achievement Tests (Physics Test, May 1963, 
"tuditional” sample) 65 

Table 3.6 Example of detailed data on speededness reported in test 
analyses of Achievement Tests (Physics Test, May 1963, 
pssc sample) 66 

Table 3.7 Example of item analysis summary data reported in test 

analyses of Achievement Tests (Physics Test, May 1963) 67 

Table 3.8 Summary statistics for English, Social Studies, Science, and 

Mathematics Achievement Tests for candidates taking both the 
sat and the Achievement Tests in 1968-69 69 

Table 3.9 Summary statistics for language Achievement Tests for 

candidates taking both the sat and (he Achievement Tests 
in 1968-69 70 

Table 3.10 Example of scatterplot form used lo record item datfi for 

Achievement Tests 72 

Table 3.11 Specified and actual item statistics for Achievement Tests 

administered in May 1967 and January 1963 73 

Table 3.12 Numbers of items (n), reliability estimates ;r*0, standard 
errors of measurement (SK,„ A, and indexes of speededness 
(a,b,c) for Achievement Tests administered in May 1967 
ai.d Januaiy 1968 75 

Table 3.13 Reliability coefficients, multiple correlation coefficients, and 
estimates of three different proportions of variance of 
Achievement Test scores for samples of candidates taking the 
SAT and Achievement Tests in 1968-69 76 



Chaple/ IV 



Figure 4.1 Attendance of twelfth-grade students at sat sessions, 1956-57 

through 1968-69 90 

Figure 4.2 Ratio of number of sats taken by twelfth-grade students to 

number of births 18 years earlier, 1956-57 through 1968 69. ... 91 

Figure 4.3 Attendance of twelfth-grade students at Achievement Test 

sessions, 1956-57 through 1968-69 93 

Figure 4.4 Percentage of Achievement Tests administered in five subject- 

matter categories, 1956-57 through 19GS-69 Ill 

Table 4.1 Scholastic Aptitude Test means and standard deviations for a 
ational sample of secondary school seniors tested in 1960, 

>y sex and academic status in 1960-61 83 

Table 4.2 SAT-verbal and SAT-matheniatical mean scores for applicants, 
accepted applicants, and enrolled students reported by 18 
collet in the Manual of Freshman Class Profiles of 1962. 

1964. and 1965-67 85 

Table 4.3 Numbers cf colleges in which changes of various amounts 

occurred in percentile ranks for selected scaled scores between 

the 1963 edition and the 1965-67 edition of the Manual of 

Fresh man Class Profiles 88 

Tabic 4.4 Means, standard deviations, and numbers of Scholastic 
Aptitude Test scores, classified by candidate’s sex and 
educational level, 1967-68 and 1968-69 89 

Table 4.5 Means, standard deviations, and numbers of cases (in 
thousands) for all candidates taking t he sat, by year, 

1956-57 through 1968-69 92 

Table 4.6 Percentages of Achievement Test candidates taking the various 

tests at least once during t he last two years of secondary school 94 

Table 4.7 Scholastic Aptitude Test moans and standard deviations for 

Achievement Test candidates 95 

Table 4.8 Percentages of Achievement Test candidates tested in the 
junior year, senior year, or both years, by specific 
Achievement Test 96 

Table 4.9 Means, standard deviations, and numbers of Achievement 
Tc*t scores, classified by leM and by candidate’s sox and 
educational level, 1967-68 and 1968-69 . 97 

Table 4.10 Mean differences in Achievement Te*t scoresenrned by boys 

and girls. 1967-68 and 1963-69 100 

Table 4.11 Mean differences in Achievement Test scores earned by juniors 

and seniors, 1967-68 and 1968-69 101 

Table 4.12 Means, standard deviations, and numbers of scort- on 

Achievement Tests in foreign languages, classitied by test and 
by candidate’s sex, educational level, and number of semesters 
of study of the language in secondaiy school, 1967-68 and 
1968-69 ’ 102 



0 

ERIC 



Chapter IV (com ixui:d> 



Table 4.13 Mean differences in Achievement Test scores in foreign 

languages earned by candidates with different amounts of 
language study, 1967-68 and 1968-69 107 

Table 4.14 Means, standard deviations, and numbers of scores on 

Achievement Tests in mathematics, classified by test and by 
candidate’s sex, educational level, and number of semesters of 
study of secondary school mathematics, 1967-68 and 1963-69.. . 108 

Table 4.15 Mean differences in Mathematics Level I Achievement Test 
scores earned by candidates with different amounts of 
mathematics study, 1967-68 and 1968-69 110 

Table 4.16 Percentages of Achievement Tests administered in five 

subject-matter categories, by year, 1956-57 through 1668-69. . . 110 

Table 4.17 Volumes for specific Achievement Tesis expressed as 

percentages of total Achievement Test volumes, by year, 

1956-57 through 1968-69 112 

Table 4.18 Attendance at Achievement Test L.^sions, classified by 

candidate’s educational level, 1956-57 and 1968-69 113 

Table 4.19 Means, standard deviations, and numbers of cases (in 

thousands) for all candidates taking Achievement Tests, 

by year, 1956-57 through 1968-69 113 

Chap: . V 

Table 5.1 Correlation coefficients of selected faculty ratings with general 

desirability rating and with first-vear grades 120 

Table 5.2 Example of expectancy table prepared by the College Hoard 

Validity Study Service 122 

Table 5.3 Example of teat ter diagram prepared by the College Board 

Validity Study Service. 123 

Table 5.4 Range of correlation coefficients that would be expected to 

include 95 percent of observed values for selected population 
values and sample sizes 125 

Table 5.5 Relation between standing on predictor and standing on 

criterion for various values of the correlation coefficient 126 

Table 5.6 Validity coefficients of sat and high school record: selected 

percentiles of validity coefficients based on students in liberal 
arts and general programs classified by sex 127 

Table 5.7 Validity coefficients of sat and high school record: selected 

percentiles of validity coefficients based on students in liberal 

arts and general programs classified by level of verbal 

ability and by sex . 1 28 

Table 5.8 Validity coefficients of sat and high school record: selected 

percentiles of validity coefficients based on students in liberal 
arts and general programs classified by homogeneity of verbal 
ability and by sex 129 



Table 5.9 Validity coefficients of sat and high school record: frequency 
distributions and selected percentiles of validity coefficients 
base:! on liberal arts and general groups classified by 
homogeneity in verbal ability 

Table 5.10 Validity coefficients of sat and high school record: results for 

engineering, science, and architecture groups 

Table 5.11 Val dity coefficients of sat and high school record: results for 

business, education, and other groups 

Table 5.12 Effect on multiple correlation coefficient of using Achievement 
Test scores with sat and high school records for predicting 
first-term or first-year grades: women in liberal arts and 
general programs 

Table 5.13 EPect on multiple correlation coefficient of using Achievement 
Test scores with sat and high school record for predicting first- 
term or first -year grades: mem in liberal arts and general 
programs and engineering students 

Table 5,14 Elect on multiple correlation coefficient of using Achievement 
Test scores with sat and high school record for predic.ing 
first-term or first-year average grades: various stud nit groups. . 

Table 5.15 Correlation coefficients of sat-v, English Composition Test, 

aid high school record with English grades 

Table 5.16 Correlation coefficients of sat-m, Mathematics Achievement 

Tef Us and high school record with freshman mather latics grades. 

Table 517 Correlation coefficients of sat, Science and Mathematics 

Achievement Tests, and high school record with science grades. 

Table 5.18 Correlation coefficients Ci sat-v, foreign language Achievement 
Tests, and high school record with foreign language grades. . . 

Table 519 Correlation coefficients of sat, Social Studies Achievement 

Test, and high school record with social studies grides 



Chapter Vi 



131 

133 

134 

136 

137 

138 

140 

141 

142 

143 

143 



Table 61 Relative achievement of the CBA, CHEM, and conti ol groups 
on three tests in chemistry 



177 



Foreword 



The College Entrance Examination Board has long since ceased to be engaged 
solely with examinations, and the examinations it now sponsors are not all concerned 
with college entrance. It is not surprising, however, that in the minds of many, the 
College Board is associated with "the College Boards"— the Scholastic Aptitude 
Test and the Achievement Tests — which constitute the Board's Admissions 
Testing Program 

This program, t largest by far of the Board's testing programs, has entered the 
lives of over a million students in each recent year, affecting the decisions made by 
hundreds of colleges and the counseling services offered in thousands of schools. It 
is natural that a program involving so many interests should have given rise to 
many questions about its nature and value. The Board is indebted to Educa- 
tional Testing Service (ETS), which administers the Admissions Testing Program, 
for providing in this book information that will answer questions about the more 
technical aspects of the program. 

It is a genuine pleasure for me, on behalf of the College Board, to thank my 
colleagues on the staff of ETS who worked so hard and well to assemble and report 
these facts about the Scholastic Aptitude Test and Achievement Tests. I am keenly 
aware of the special burdens that William AngoPf has borne as editor, and share 
with all who Were involved in l he project admiration and appreciation for the grace 
and skill with which he brought the project to completion. 

The publication appears at a time when changes in the Board's testing programs 
are in the wind, some of t hem stimulated by the work and report of t he Commission 
on Tests, appointed in 1907 by Richard Pearson, former President of the Board, 
"to review all of the Board's existing examinations, gather evidence of the need for 
change, and consider what tests may be needed a decade hence." This book shows 
that significant changes have taken place time after lime in the Board’s college 
entrance tests, and it is fair to assume that further changes may well be necessary 
if the program is to continue to fit the reeds of colVges. schools, and students. The 
resourcefulness and talent underlying the techniques described in this book offer 
promise that the best possible efforts will be brought to bear on the development of 
whatever new tests and techniques the needs of the future demand. 

John A, Yahnthic 
College Enl ranee Examination Board 
September 1970 




Preface 



This report was prepared for the specialist who needs technical information about 
the Admissions Testing Program of the College Entrance Examination Board. It 
brings together the findings of a considerable number of studies of the Scholastic 
Aptitude Test (sat) and the Achievement Tests. It assumes a reader fairly well 
acquainted with the concepts and statistics of educational measurement. 

Three publications might be considered the book’s predecessors; College Board 
Scores , Their Use and Interpretation, A r o, 1 , published in 1953; College Board Scores , 
Their Use and Interpretation , No. 2, published in 1955; and 1957 Supplement to 
College Board Scores , A T o. 2. These publications are a compendium of available 
information on the College Board Admissions Testing Program up to 1957. The 
present book occasionally refers to the earlier documents, but focuses mainly on 
more recent data. 

The emphasis of this book is also somewhat different from that of its predecessors. 
The earlier publications were, in a sense, ”how to do it” books, Their purpose was 
to describe how the tests Might properly be used in admissions and guidance and 
to provide data needed for interpreting the test scores. To a considerable extent, 
this function has been taken over by College Board Score Ruperts: A Guide for 
Counselors and Admissions Officers , which is revised annually and widely distributed. 
Mention should also be made of the computation manual, Pn dieting College Grades , 
published in 1961, and the Manual of Freshman Class Profiles , which was published 
until 1969, when it was incorporated in The College Handbook, 

These progeny of the old use-and-interprelation books retain some of t lie "liovv 
to do it” emphasis of their ancestors, but lend to be mor infei prelive than explic- 
ative. They do not, however, provide all the data needed for a comprehensive 
technical evaluation of the tests. The aim of this book is to fill that gap. 

The Board’s Admissions Testing Program must respond effectively to a variety 
of unique and complex demands. In order to indicate the setting within which the 
program operates, t he first chapter describes the program as a whole — its history, 
how it is organized, the unique demands it faces, and technic il matters involved in 
its operation that arise from the demands. 

The second chapter discusses the sat, the third is about ti e Ac h i vement Tests. 
The hurth chapter interprets summary statistics for various subgroups of students, 
and the fifth reports on some of the extensive validity studies of the admissions 
tests. The sixth and final chapter summarizes the major findings of special studies 
that have been made in connection with the sat and the Achievement Tests, 
Although there is. quite properly, an interrelationship among (he topics covered 
in the various chapters, the chapters were written independently with the intent 
that each would stand as an integral unit. containing all the material relevant to it. 
However, since some material, especially that having to do with technical pro* 




cedures, has relevance to more than one chapter, there is inevitably some repetition 
across chapters. 

The authors were chosen for their special competence in the subject of their 
chapters, but the book as a whole is the product of many other persons, most of 
whom are connected in one way or another with the Admissions Testing Program. 
In fact, the contributors include a substantial number of the staff members of the 
College Board and Educational Testing Service, and for this reason it would be 
practically impossible to list them in this preface for the credit they richly merit. 

Especially helpful were those individuals who served with me on the review com- 
mittee, some of whom were also chapter authors: William E. Coffman, John M. 
Duggan, Henry S. Dyer, William B. Schrader, Robert J. Solomon, and John A. 
Valentine. 

Additionally, the following members of the College Board Committee of Examin- 
ers in Aptitude Testing were of great aid in reviewing chapters in manuscript form: 
Carl Bereiter, Frederick B. Davis, John R. Hills, Julian C. Stanley (Chairman), 
Warren S. Torgerson, and Dean K. Whitla. A special debt of gratitude is owed to 
David E. Loye for bis able editorial assistance and to the staff of the ets publications 
division for coordinating publication of the book. 

The Admissions Testing Program is revised continually to meet changing con- 
ditions. Thus, it can be expected that this book, which describes the program, 
will be revised as necessary to report new developments in the program and to 
reflect the results of research on new as well as old problems. 



H'l'HiVm //. Angoff 
Educational Testing Service 
Sept ember 1970 



CHAiM'KU 



I 



The Admissions Testing Program 

WILLIAM H. ANGOFF and HENRY S. DYER 



Historical background 

The College Entrance Examination Board was organ- 
ized during a meeting held at Columbia University in 
New York on November 17, 1900. The move cul- 
minated efforts over several years in this direction by 
the Association of Colleges and Preparatory Schools of 
the Middle Stales and Maryland, and a number of 
educators in the East, notably Nicholas Murray Butler 
of Columbia and Charles W. Eliot of Harvard. The 
event was, according to the Board’s first historian, the 
first organized "attempt to introduce law and older 
into an educational anarchy which toward the dos3 of 
the nineteenth century had become exasj derating, in- 
deed in tolerable, to school m asters." < Euess, 1 950, p. 3.) • 

At this time there was, in the opinion of t lie founders 
of the Board, appallingly little agreement among the 
colleges about the tyjK’s of subject-matter preparation 
and standards of proficiency required of the applicants. 
One headmaster, for example, complained that out of 
40 ; t so boys prepa ring for college in his school, he had, 
in reality, more than 20 different classes. The effect of 
this diversity on the secondary schools was to mak? the 
task of preparing their stud on Is for college an extremely 
difficult one, md made the task of the student who was 
not mre which college lie hoped to attend difiicuh. and 
confusing as well. 

In its attempt then, to accomplish its pi»i|>ose and 
introduce an order into t lie transition from schcol to 
college, the College Board established the beginnings 
of a system of syllabi or "course requirenrenhs" on 
which schools and colleges could agree, and which might 
form the basis of a system of examinations offering the 
uni ormity that was so badly needed. That is to say, 
the examinations would be uniform in subject matter 
and uniformly administered at umtorm time?, but held 
in many places to meet the convenience of the students, 
and they would be uniformly graded. It was exited 



* 1' ic authors wish lo acknowledge ev^istance provided by 
tl c outline, and, in many instances, the particular pKasing 
u-jed in Ur. Kuess" excelle nt overv iew of the Boa d*s history. 
They also wish to acknowledge the in\ a hi able sonar of in- 
f( rmatiwi provided by the College Board's Annual Rcjtorts. 



o 




that a system of examinations of this sort would effect 
marked savings of time, money, and effort in admin- 
istering college admissions, that it would greatly aid the 
work of the secondary schools by reducing confusion 
and easing the strain on students, and that it would 
represent a cooj:>crative effort of n group of colleges and 
secondary schools to achieve a set of common goals 
without asking the colleges to surrender their preroga- 
tives as to the particular examinations they required 
of their applicants or the manner in which they might 
wish to select their students. 

It is iirqiortant to recognize that the examinations 
were secondary to the main puri>ose of the Board, 
which was lo {provide a channel of communication be- 
tween the schools and colleges and to encourage a do- 
gr.x* of uniformity in the secondary school curriculum. 
On the other hand, it is more than likely that these pur- 
looses would never have been achieved without the 
instrumentality of the examination system which, upon 
acceptance by the colleges as a way of settingstandards, 
effectively paved the way for the introduction of uni- 
form curriculunis in the schools. 

In its first year of oj)cn.tion the College Board held 
essay examinations in nine subjects: English, French, 
German, T^it in , Greek, history, mat hematic's, chemistry, 
and physics. The definition of the requirements in each 
subject was taken from the recommendation of the 
professional association in each subject-matter area. 
The substance of each examination was then determined 
by a carefully selected committee of examiners, con- 
sisting of well-known teachers and scholars in the lead- 
ing colleges and secondary schools in the East, who met 
to confer on the general content and structure of the 
examinations and to decide on the sjxxifie questions to 
be asked. After extensive picparation, the first exam- 
inations were finally administered during the week of 
June 17, 1901, to 973 candidates at 69 testing centers. 
Thus, although the Board was as yet far from being a 
national institution— 75S of the 973 candidates were 
seeking admission to either Columbia or Barnard — it 
was at least a going o|>eration with the beginnings of a 
mechanism for continuation. 

As with the committees cf examiners, committees of 
readers were also chosen with care. These committees. 



1 



one in each subject, were assembled ill the Columbia 
University Library and graded (lie pa|>ers in accordance 
with a procedure that had been worked out and agreed 
on in advance. A total of 7,S89 examination papers, 
averaging over eight papers i>er candidate, were read 
and graded on a percentage-type scale, in which the 
designations Excellent, Good, Doubtful, Poor, and 
Very Poor were attached rcsjx.*ctivoly to the ratings 
90-100, 75-89, 60-74, 40-59, and below 40. Special atten- 
tion was given to palmers that were originally given 
grades below 60. These were always reread and often 
discussed at length. 

In the second year, Spanish, botany, geography, and 
draw ing were added to the list of examination subjects. 
In 1902 the number of candidates rose to 1,362, a 40 
percent increase over the preceding year. Thereafter 
growth was regular and continuous. In the course of the 
first decade additional colleges became members of the 
Board, and by 1910, as a consequence of the growth in 
membership, the number of candidates had increased 
to 3,731. 

An important early development was the creation of 
a Committee of He view for the College Board as a 
whole. Us main function was to examine the require- 
ments in each subject and arrange w ith the committees 
of examiners for modifying the requirements whenever 
this seemed desirable. It soon became clear that the 
very fact of the committee’s existence had become an 
im|>orlant factor in the development of the secondary 
school curriculum, for it had the effect of moving the 
schools and colleges toward a badly needed system of 
uniform standards. However, like the establishment of 
the Board itself, this change was no 4 , universally re- 
garded as an unmixed blessing; many of the secondary 
schools and colleges regarded the movement toward 
uniform standards as a dangerous encroachment on 
their autonomy. In 1910, therefore, partly in response 
to this tyjx? of pressure, the evaluative designations for 
I percentage grades that had been introduced in the first 
year of the Board's o|**rMions were dropixxl, and 
schools and colleges were left free to attach whatever 
evaluations hey considered appropriate to the various 
numerical grades. 

During the second decade of the Boards existence the 
philosophy if examination*: itself, rpjnx tally for admis- 
sion to college, began to change, gravitating toward the 
idea of '‘comprehensive examinations, " in which stu- 
dents* would not lx* asked to rejx'at the facts that they 
had learned in school but to demonstrate an under- 
standing of the relation of discrete facts to one another, 
to generalize the facts into working principles, and to 
apply them to new and uncxj»ocled situations. This 
development- the "New Plan.’’ as it was called — 
provoked violent objections from the conservatives 
who insisted that it would lie inqxissihlc to prepare 
students for the examinations, that it would lx* difficult 
to grade lh? examinations, and that examinations of 
this sort would plait* a premium on superficial clew 
ness at the tx|x*nse of schoTaiship. At the same time the 

o 



o 




"New Plan” also tended to discourage attempts to out- 
guess the examinations end to predict the particular 
test questions that would appear in them. 

By 1925 the College Board was ready to enter a new 
era. Stimulated by the World War I Committee for 
Classification of Personnel in the Army and its work 
in the testing of "genual intelligence,” the Board 
established a Commission to investigate the relevance 
of these new psychological tests to the problem of col- 
lege admissions, and on ds recommendation appointed 
an Advisory Committee of cxixrts, including Carl C. 
Brigham, Henry T. Moore, and Robert M. Vurkcs, to 
formulate a suitable test development approach. In 
April 1925 the Board accepted the recommendation 
that the psychological tests be administered in 1926 
and appointed a committee of five with Professor 
Brigham at its head to prepare and score the tests. 
Within a short time the Brigham Committee produced 
a manual on what they called the "Scholastic Aptitude 
Test,” explicitly distinguishing it from tests of achieve- 
ment in school subjects but disclaiming any intention 
to measure "general intelligence” or "general mental 
alertness.” In their preface to the manual they intro- 
duced a paragraph which expressed a |>oint of view that 
is still regarded today ns highly relevant to the use of 
test scores: 

"The present state of all efforts of men to measure 
or in any way estimate the worth of other men, or to 
evaluate the results of their nurture, or to reckon their 
|K>tcntial jxkssibilitics does not warrant any certainty 
of prediction . . . This additional test now made avail- 
able through the instrumentality of the College En- 
trance Examination Board may help to resolve a few 
jxTplexing problems, but it should be regarded merely 
as a supplementary record. To place too great emphasis 
on test scores is as dangerous as the failure projierly to 
evaluate any score or rank in conjunction with other 
measures and estimates which it supplements.” 
t Brigham, 1926, pp. 41-45.) 

The first College Board Scholar! ic Aptitude Test 
(sat), a multiple-chobv examination for the most part, 
WiU added to the CoM?gc Board Program and admin- 
istered on June 23, 1 >2G, to 8,010 candidates. During 
the first two years, it consisted of nine subtests: Defini- 
tions, Arithmetical Problems, Classification, Artificial 
bmguagc, Antonyms. Number Series, Analogies, logi- 
cal Inference, and Paragraph Reading. In 1928 these 
were reduced to seven and a year later to six. In 1929 
Dr. Brigham decided tha* : t had become necessary to 
divide the sat into two separate sections, one measuring 
verbal aptitude and l he other measuring mathematical 
aptitude. The decision to rejxirt two separate scores was 
made in order to givi differential weight to verbal and 
mathematical aptitudes in accordance with the nature 
of the college to which the candidate was applying, and 
in some instances, in accordance with the nature of the 
curriculum within the college. 

The early 1930s brought additional changes. The 
Hoard moved still further toward developing comprc- 



hcnsive examinations in each subject that would call 
upon a Candida :t’s ability to integrate material that he 
had learned from various sources in solving examination 
problems, rather than merely to recall and reproduce 
isolated bits of information. The Board also began to 
express more active interest in experimentation and 
tryout. They also concentrated on gaining more con- 
sistency in their deration. It had been observed, for 
example, that the number of candidates earning passing 
grades was fluctuating much too widely from one year 
to another. Since it seemed reasonable to assume that 
the candidate [ copulations were more stable than were 
the difficulties of the tests, it was decided to fix the 
proportion passing each test and not allow ■! to vary 
from year to year as it had in the past. 

The period of the middle 1930s was a trying one for 
the Board. The volume of June candidates had dropj>ed 
over 35 jvreer t in the six years from 1931 to 1936, and 
the Board was under serious criticism by the secondary 
schools that were chafing under the restrictions imposed 
on their currieulums. In response to pressures from the 
schools, the new requirements were broadened to stress 
general principles and large assets of the curriculum 
rather than the detailed subject-matter material. 

Technical asjx?cts of test construction also received 
increasing emphasis. For example, it was felt that the 
examinations should not represent a more accident of 
the selection of questions, but a wide variety of areas 
within each subject, yielding a score that would ade- 
quately reflect the candidate's ability and training. It 
was also felt that attention needed to bo given to the 
reliability of the reading process as well as to the 
mpthodsof formulating the questions without sacrificing 
attention to the production of a sot of examinations 
that would continue to have a wholesome influence on 
the schools. 

In 1937, for the first time, an additional administra- 
tion was irstituled— to be held in April, principally for 
scholarship applicants. For the first time also, wholly 
objective Achievement Tests were introduced at this 
ad minis! re tion — a move that accelerated t lie scoring 
and rei»orting process, improved test reliability, and 
)x?rmit(ed the examination of a greater variety of test 
cor tent. 1 i 1938 the April administration wa^ extended 
to :ncJucle applicants for admission to college who were 
not scholarship applicants. From then on the April 
administration gained in prominence until in 1910 the 
number of students taking the sat in April was larger 
than (he number taking it in June. Because of this in- 
crease in the relative size (and imjortancc) of the April 
administiat ion it was felt necessary to provide a means 
cf comparing sat scores on the April and June tests 
directly. Beginning in June 1911, then, the scores on 
e\ery fo’m of the sat were equaled directly to the 
scores on some preccd i n g form of the SAY, and ultimate- 
ly and indirectly to the April 1911 form. The group 
Bsted ir April 1911 thus became the standardization 
gxmp, defining the continuing scale in terms of which 
scores on all future forms of Die SAT would le expressed. 



In 1937 the Achievement Tests were first reported on 
a scale with a mean of 500 and standard deviation of 100 
(like the scale which had been in use for the sat since 
its introduction in 1926) and rescaled each year on the 
new candidate group. Two years later, beginning with 
the April administration, adjustments were made in the 
scales for each of the Achievement Tests in accordance 
with the level and disjiersiort of the gro Jp choosing to 
take that test, as reflected by the relevant (verbal or 
mathematical) section of the sat. 

With the outbreak of World War II, Harvard, Yale, 
and Princeton decided as an emergency action to 
accelerate their program of studies and begin their col- 
lege year in June or July. This shift made it necessary 
for their 1912 candidates to take (he April Achievement 
Tests, which were now all objective, instead of sitting 
for the six-day June Achievement Test program, which 
in 1941 still consisted of essay examinations. It soon 
became evident that in order to accommod. te itself to 
these sudden changes, the Board would have to con- 
firm a decision that had b?en under consideration even 
before the outbreak of hostilities and commit ilsrif 
broadly to the plan that Harvard, Yale, and Princeton 
had adopted on an emergency basis. This development, 
which was originally intended as a temporary measure 
in response to the nation’s entry into the war, actually 
marked the end of the .June essay-ty[x? examinations 
after continuous use for 41 .sears. 

At the same time, the Board adapted itself in other 
ways to the needs of the nation at war. The Board de- 
velo]>ed the V-12 Testing Program for use in the selec- 
tion of high school graduates for officer candidate train- 
ing. It develop'd tests for the U.S. Armed Forces Inst i- 
iute and the Army Signalized Training Program, and 
involved itself in other nj>crntional and advisory capac- 
ities to the Government . Toward the end of the w'ar, in 
anticipation of the large' numbers of veterans returning 
from the war who wt uld be seeking ]>o:;Gsecondary 
school education, it pr 'pared tests designed for use in 
college admission of veteran applicants. It furnished 
tests for scholarship awards sponsored by the Westing- 
housv Com.pany and constructed social tests for the 
Pepsi -Co! a Scholarship Program. And it also assisted 
in the preparation of qualifying examinations for the 
Foreign Service, the Military Academy, the Naval 
Academy, the* Coast Guard Academy, and the Bureau 
of Naval Personnel. 

In 1917, with the formation of Educational Teslmg 
Service UTriJ and a greater focusing cf the Board's 
interestsand activities on the transition from secordary 
school to college, special testing programs were turned 
over to f.ts for management according to the Board's 
significations. In subsequent years some of these 
programs, like llu program for the military and naval 
academics, were inccr|>orated within til J general Ad- 
missions Tesling Program of the College Hoard, (At the 
present time, all of the service academies are members 
of Die College Hoard and require its tests of their appli- 
cants.) Bixxial scholarship programs, similar to (hose 



o 



3 



previously offered by VVestinghouse and PepsiCola, 
also make use of the College Board tests and are man- 
aged in behalf of t lie Board by the College Scholarship 
Service staff* at ets. 

By now, some 70 years after its formation, the Col- 
lege Board has become a truly national organization. 
It has also broadened its j>ers| active in an effort to 
respond to the greater variety of demands on the educa- 
tional facilities of the country. In addition to the sat 
and Achievement. Tests, which are the principal tests 
of its Admissions Testing Program, the Board also 
offers t lie Preliminary Scholastic Aptitude Test, the 
Advanced Placement Examination Program, the Col- 
lege Placement Tests Program, the Comparative 
Guidance and Placement Program, and the College- 
Level Examination Program — clep, as it is known. The 
last is a program of credit by examination for un- 
aiTilialed students, and can also be used for evaluating 
students who want to transfer from the lower to the 
upper division of a college, or from one college to 
another. 

The College Board has also develo|)cd a program of 
aptitude and achievement tests in Spanish for Spanish- 
shaking students who may be applying for admission 
to universities in Puerto Rico or in the continental 
United States. It has develo|)cd an English-language 
apt it ude test for African stude nts applying for scholar- 
ships in American universities (the aspau program). 
It offers, with the joint sjxmsorship of ets, the Test of 
English asa Foreign Language (TOEFL), which measures 
the proficiency of foreign .students in the English 
language. The Board also operates the College Scholar- 
ship Service, which provides a service for determining 
the need of students applying for financial aid and 
assists in the administration of scholarship programs. 
Finally, the Board lias a program of guidance services, 
a program of seminars for admissions officers and 
another for guidance counselors, and a validity studies 
computation serv*ce . 

At the present time, as opj>ort unities for j»ost- 
secondarv education are being widely extended to 
youths who in the past would not have been college- 
bound, the Board through its Commission on Tests is 
re-examining its test offerings to sec if they inay be 
modified and extended to assess a w ider range of tclent. 

Advi inifitraliic considerations 

The size and complexity of the Admissions Testing 
Program and the variety of the demands uj*>n it 
generate a great range of administrative and f>sycho- 
metric problems, each lequiring special solutions. This 
section of the chapter will describe the general nature 
of the program, the problems and conditions of its area 
of otiernlion, and its administrative solutions. Similarly, 
the next section will describe its technical solutions for 
the administrative and technical demands of the pro- 
gram and will set the stage for more detailed discussions 
in subsequent chapters. 



0 




Nature of the program 

The College Boar 1 Admissions Testing Program, as of 
the 1968-69 year, consisted of a Scholastic Aptitude 
Test (the SAT) and 15 Achievement Tests covering 
English (Comj>osition and Literature), six foreign 
languages (French, German, Hebrew, Latin, Russian, 
and Spanish), two branches of the social studies (Amer- 
ican History and Social Studies, and Eurojaan History 
and World Cultures), two levels of mathematics {Level 
I and Level II), find three sciences (Biology, Chemistry, 
and Physics). The sat is a three-hour test that yields a 
verbal score and a mathematical score. The Achieve- 
ment Tests are eac h one hour in length. They are given 
in a single three-hour test session, during which a can- 
didate may take any one, two, or three tests at one 
sitting. Each test yields a single score. 

During the 1968-69 academic year, the program’s 
tests were administered as follows: The sat w^as given 
in the morning on six Saturdays during the academic 
year — in November, December, January. March, May, 
and July; the Achievement Tests were offered in the 
afternoon of each of these dates, except the one in 
November. Sunday sessions were also provided after 
each of the six Saturday test dates to accommodate 
candidates who observe t lie Sabbath on Saturday. 
Deluding uj>on college admission requirements, a 
candidate could take the sat on one date and the 
Achievement Tests on another date, although both 
lyi>es of tests may be taken on the five dates "hen both 
are offered and many candidates do take them both 
on the same date. Virtually all candidates take the sat, 
and in t lie 1938-69 year about -10 jx?recnt of them took 
t lie Achievement Tests as well. 

As ail ad hind to the Admissions Testing Program, a 
scries of Supplementary Achievement Tests was given 
on a single date in February in any secondary school 
that wished to administer the tests to its own students. 
The series consisted of five 30-min’jte listening com- 
prehension tests in five foreign languages (French, 
German, Italian, Russian, and Spanish), a 90-minute 
free-resjK>nsc test in Greek, and a 60-minute objective 
test in Italian. They were available t<> candidates who 
registered for the regular Achievement Tests during 
the testing year. * 



•Beginning i:i llic 197n-7l year, new listening-reading 
AchieVt nient Tests in French, German, Italian, Russian, 
and Spanish were introduced into iho regular Admissions 
Testing Program. The only Supplementary Achievement 
Test offered in 1970-71 is the Greek Test. For detailed in- 
formation on the scope and content of the Admissions Tes- 
ting Program, sec the latest editions of the following: Hullc - 
tin of Information, Sthohstic Aptitude Test, Achiciemrnt 
Test,*; A th s ription of the CWfrgc Hoard .S ho! attic Aptitude 
Test: and A Description of the CV/rgc Hoard AcMeermrnf 
Testa. These may bo obtained fr<c of charge by writing to: 
College Enhance Examination Board, Box 592, Princeton, 
N.J. 08510. 



As of the 1968-69 academic year, 834 colleges were 
members of the College Board, and practically all of 
them required their applicants to take at least the 
Scholastic Aptitude Test. About 350 of the then mem- 
ber colleges required some or all of their applicants to 
submit Achievement Test scores as well, either for use 
in admissions decisions or for course placement, or for 
both purposes. Some colleges specified the particular 
Achievement Tests to be taken, but most permitted 
the applicant some freedom of choice. Although there 
was considerable variation among colleges as to the 
latest test date for which they would accept test scores, 
there was an increasing trend toward acceptance of 
earlier (junior-year) sat and Achievement Test scores 
for use in admissions decisions, whether or not the 
student was applying under an early decision plan. 
Many students also took the Admissions Tests in March 
or May of their junior year for guidance and practice 
and repeated them in December or January of their 
senior year. Many, perhaps most, of these students 
would also have taken the Preliminary Scholastic 
Aptitude Test (fsat), which is offered in the fall as a 
semisecure test, principally for secondary school 
juniors for purposes of guidance and familiarization 
with the sat type of test. 

Typically, close to 40 percent of the candidates taking 
the sat in December or January iri their senior year 
have taken the test once before in March or May of 
their junior year. One result of this fact is that the 
students and their parents, as well as the schools and 
colleges receiving score reports, have an opportunity to 
observe a wide variation in score changes (gains as well 
as losses) from the first to the second testing, and many 
write to the Board and to ets asking for an explanation. 
These queries have become more frequent because of 
the increasingly popular practice of taking the tests 
twice and because of a general downward trend in 
average score gains in recent years. The task of explain- 
ing the problem of score change is not easy because the 
explanation rests so heavily on considerations of error 
of measurement— a concept which is neither well under- 
stood nor easily accepted by the nonstatistically 
oriented. However, special efforts have been made to 
clarify the problem of score change by conducting 
theoretical and empirical studies, and then describing 
the findings to test users in special score interpretation 
booklets. (See Chapter II of this book.) The results of 
these studies of score change are reported in Chapter VI 
of this book. 

Although test scores frequently play an active role 
in college admissions, their use varies considerably from 
one college to another. In some colleges formal predic- 
tion equations are calculated, utilizing the secondary 
school record, sat scores, and sometimes Achievement 
Test scores as predictors. The prediction equations are 
then used in conjunction with recommendations, prizes 
and awards, nonacademic information about the appli- 
cants, and results of interviews. Other colleges use 
College Board scores in a less formal way. Some use the 



scores to decide only about marginal candidates, and 
some (principally certain of the state universities) use 
them only for out-of-state candidates. Finally, some 
colleges use the test scores, not for admission at all, but 
for purposes of counseling and guidance and for place- 
ment. 

Because of the number of times a candidate may, if 
he wishes, repeat the tests, it is necessary that the num- 
ber of active secure forms be sufficiently large to permit 
relatively infrequent reuse of a given form. To keep the 
pool of active forms large enough to meet this degree of 
flexibility, several new forms of the sat and of each 
Achievement Test are introduced into the program 
each year. 

All told, the number of candidates participating in 
the program during 1968-69 was about 1,950,000. They 
took the tests in over 4,500 centers, of which about 300 
were located in some 100 foreign countries on all six 
continents. 



Administration of the tests 

Administrative problems are generally of three principal 
kinds: (1) those involved in bringing together the candi- 
dates and the tests at the proper time, (2) those involved 
in giving the tests under standard conditions that 
guarantee maximum protection against dishonesty and 
breaches of test security, and (3) those involved in 
scoring the tests accurately and reporting the results 
correctly and in sufficient time for appropriate use by 
the colleges and schools. 

Candidates are informed about the program through 
a series of publications. The basic document explaining 
the procedures to follow is the Bulletin of Information , 
which is updated annually and distributed in appro- 
priate quantities to secondary schools before the begin- 
ning of the school year. It contains information on the 
test offerings, the times and places where the tests are 
given, and methods for registering for the tests. It also 
refers candidates to The College Handbook , published 
by the Board, the 1969 edition of which contains state- 
ments compiled by 832 colleges that are members of the 
Board. The statements include both general facts about 
the colleges and descriptions of s])ecific college charac- 
teristics, such as their admission and examination re- 
quirements, cost of attendance, financial aid programs, 
and, in some cases, information and statistics about the 
academic qualifications of enrolled students. Booklets 
describing the sat and the Achievement Tests are also 
distributed in bulk to all secondary schools so that every 
candidate can obtain a preview of what will be expected 
of him in the testing. 

The candidate registers for the tests by filling out a 
registration form with the necessary information about 
himself and indicates the tests he plans to take and the 
scheduled date on which he will take them. If he plans 
to take Achievement Tests, he need not specify the 



o 

ERJC 



i Q 



5 



particular tests until he actually takes them.* From 
lists in the Bulled t of Information , the candidate selects 
the center where he will take the tests and the cc ’leges 
to which he wants his scores sent, and enters this in- 
formation on the registration form. Once registered , the 
candidate receives an admission ticket to the test 
center. The ticket carries, along with other information, 
the registration number he is to use on the test answer 
sheets. 

Test centers for each administration are usually 
established a year in advance at secondary schools and 
colleges, and if possible are located within 75 miles of 
any candidate. The sujiervisor of a test center is usually 
an official of the institution where it is located. Seven 
to ten days in advance of a test date the su|x?rvisor 
receives the necessary tests and supplies. One week in 
advance he receives a roster of the names of candidates 
assigned to the center for the test. The conjunction of 
tickets and rosters ] provides a final cheek on whether 
the riglit candidates are taking the right tests at the 
right lime in lherii~ht center. 

The effort to insure that tests are administered 
proj>erly begins with the appointment and instruction 
of the test center su|x?rvisors. Each supervisor is given 
a Sapcreisor’s Handbook and a Supemscr’s Manual, and 
as a member of the temporary staff of Educational 
Testing Service he is expected to master both docu- 
men Is and follow the directions in them. The Super- 
visor's Handbook sets forth the general requirements of 
lest center management: for example, 1”j>e of facilities 
and staff needed, handling of test materials, sealing of 
candidates, timing, and maintaining test security. The 
Supervisor's Manual is specific to the tests to be admin- 
istered on a given dale. It contains detailed rules for 
conducting the test sessions, verbatim directions to 
candidates, and forms for reporting on several aspects 
of the administration including any n regularities that 
may occur. 

0)>eralions of t lie test centers are continually re- 
viewed by staff members of Educations I Testing Service. 
If testing conditions at a center arc found to be in- 
adequate, appropriate adjustments are made (including 
replacing the sii|>er visor it necessary). 

The number oT candidates involved in any single 
administration of the tests has so fa* run as high as 
632,000. Within four to five weeks after an administra- 
tion the test scores are sent to the coh-'ges. During this 
interim [X'riod, the task of translating the marks on 
test answer sheets into interpretabh* score re|>orts is 
accomplished largely by high-sq-ccd scoring and data 
processing equipment. The task is complicated by the 
fad that a candidate may have taken any one of many 
Ixissibfc combinations of Achievement Tests for 



*lnthe 1970 /1 year, asp dent who plans to take one of the 
rew list* rung. rending foreign language Achievement Tests 
will have to «.;kc i Ty on his Registration Form which lan- 
guages he wants to l>e test id in. 



o 




example, F.encb, American History, and Physics, or 
English, Chemistry, and Mathematics Level I. It is 
further complicated by the fact that the English Com- 
position Test may contain an essay section that must 
be graded by readers. The fact that the score reports 
are now cumulative over a two-year i>eriod adds still 
another dimension of complexity to the task of score 
reporting. 

To insure the greatest pussiblc accuracy in this scor- 
ing and reporting oi>eration a series of checks is built 
into the data processing system. The system is pro- 
grammed to reject cases for which the information 
needed for identifying the candidate or the tests he has 
taken is inadequate or questionable, or for which the 
markings on any answer sheet are such as to throw 
doubt on the score. These rejected cases— some seven 
percent of the total— are scored by hand. In addition, 
a continuous quality control procedure, based on a 0.3 
l>erccnt random sample of all the cases, provides a 
constant outside check on the total system. Studies of 
scoring accuracy show that over 99 percent of the sat 
and Achievement Test answer sheets arc scored with no 
errors and that crrorstlnt do occur arc almost uniformly 
onc-point errors in raw score (six to ten j>oints on the 
standard scale). All answer sheets are kept on file for 
one year should any question arise concerning the 
accuracy of a score. 

Test scores are accumulator! from grade 10 through 
grade 12 for high school students, and for one year for 
candidates not in high school To insure that a candi- 
date’s scores will be collated with those he may have 
earned at earlier administrations, identification is 
required of the candidate each time he takes the tests — 
name (expressed precisely the same on nil occasions), 
sex, and birth date, for example, and social security 
number when available. Failures to secure precisely 
uniform information of this sort from re|>ea mg candi- 
dates occur in about 5 jicrccnt of (he cases in any given 
series. In such ca vs only the candidate's current scores 
will ap|*\ir on ’ns rc|>ort. Except for (he special case of 
twins who do not have social security numbers, the 
additional controls on the data are sufficiently rigorous 
so that the probability that one candidate’s scores will 
be mismatched with those of another candidate is 
practically zero. 



Influence on curriculum 

Because of its involvement in the inijiortant process of 
transition from high school to college, t lie College 
Board, through its Admissions Testing Program, is ill 
an unusually strategic i>osition to exert a significant 
influence* on American secondary school education — 
because many of ti e secondary schools tend to gear 
their curriculum to what they expect will apj>ear in the 
next forms of the tests. This academic fact of life leads 
to a key problem for Board |>olky am! planning. If. on 
the ot hand, Board tests fail to keep abreast of new 



trends in curriculum, then the Board is considered 
derelict in its responsibility to represent what some will 
consider to be the best in American secondary educa- 
tion. If, on the other hand, the Board tests respond too 
quickly to the pressures exerted by overly enthusiastic 
exponents of educational philosophies still in the process 
of development, then the Board is considered guilty of 
leading American secondary education along ill-advised, 
unproved, and dangerous pathways. Consequently, the 
following philosophy has guided the Board, particularly 
since t be early 1940s, in the selection of content for 
its tests: that its tests are constructed, not with a single 
syllabus in inind, but with a blueprint based on sampling 
widely and fairly the variety of courser represented in 
the prominent and well-subscribed curriculums through- 
out the country. 



Coaching 

It is for this reason— the philosophy just described — 
that the publications of the Board have maintained 
that s|>C( ial coaching for the examinations is neither 
needed nor advised, and that little is gained by trying 
to outgue ss and predict the content of the examinations. 

The efforts of the Board in regard to coaching have 
been directed not only to the students who would seek 
quick and easy answers to the substantive test ques- 
tions on the Achievement Tests, but also, and more 
particularly, to those who hojx? to improve their scores 
on the far less curriculum-oriented test questions that 
form the basis of the sat. louring the 19G0s, with rapid 
increases in the numbers of students necking admission 
to college and the adoption of increasingly stringent 
selection procedures by nvny of the colleges, the College 
Board tests have come to be regarded hy many students 
as barriers to college that must, at all costs, lx‘ over- 
come. In resj>onsc to what apj>cared to be an eager and 
growing dc mand, a number of publishers have market- 
ed collections of test items purportedly representative 
of— and, in some instances, apparently taken directly 
out of— actual forms of the sat. A parallel development 
has been the rise of coni King or c ramming schools that 
guarantee high score* benefits to students who enroll in 
their short-term courses. 

Appalled l»y t?ic subversive effect of these commercial 
enter] 'rises on the goals of education, the Board has 
prepared a sjeeial booklet (College Board, 19(58) 
describing IK* effects of coaching. The booklet outlines 
the findings of seven studies* conducted in this area — 
four by the College IJoard-KTS staff and three by 
indcp’ndenl researchers — which si oived at lx>t only 
small and insignificant gains on tin sat resulting from 
shox t< term intensive coaching. A. k a result of these* 
studies the College Bo. rd trusties haw prep; red a 
statement, which is reproduced in the ttonklet, that 
urges students and their parents i,oL to waste their time 
and money on this kind of coaching. 



o 




Security 

Since scores on College Board Admissions Tests play 
an important role in many colleges’ admissions deci- 
sions, it is not surprising that a few candidates try to 
improve their scores by cheating. To prevent cheating, 
Educational Testing Service has string- nt security re- 
quirements for every phase of a test ac ministration. 

Tests are printed by carefully selected printers and 
shipped by traceable means. Every tes; book is sequen- 
tially numbered, sealed, and inserted ii units of five or 
ten in transparent plastic bags which the test super- 
visor is instructed not to oj)0n until testing is about to 
begin. No one is j>erinhtcd to examine the test contents 
before or after the administration. Or ly the candidates 
sec the test questions, and then only ( during the actual 
administration. All test books, us>ed ^ind unused, must 
bo returned immediately after the testing to ets, where 
the books are counted to insure that all have been 
returned. 

Educational Testing Service also? gives test sujjer- 
visors s]>ecific instructions for the Emission of candi- 
dates to test centers and their assignment to seats at 
Kpccifiod distances from other candidates. Constant 
proctoring is required throughout thitest session. How- 
ever, despite these precautions, a t *st su|>crvisor occa- 
sionally observes cheating. More often, evidence of 
cheating is found later when an institution has ques- 
tioned a candidate's scores because of inconsistency 
with the rest of his academic record. 

In all investigations of individuals* scores, the sole 
concern of the Board and ETt; is :o establish whether 
the scores in question were or wee not earned by the 
candidate without unauthorized assistance. A candidate 
who is svsjiected of having cheated is offered a retest 
without prejudice— that is, with >ut re|x>rting to the 
institutions designated by the candidate to receive his 
subsequent score's that the investigation has been made. 
The results of the retest are used to confirm — or to 
invalidate— the scores rcj>ortcd fc r the original test. 

It is evident that ali t lie atovc mentioned influences 
— the attempts to shaj** the curriculum to reflect the 
content of the tests, to coach ard cram for tests, and 
finally, to find extralegal or otherwise unothunl means 
of inflating the test scorer — dcri *c from an unrealistic 
and exaggerated view of the imj orlancc of test scores 
on the part of students and test u ers. Hence, the Board 
in its various publications and its frequent personal 
contacts with test users has re xvtcdl.v tried to put 
test scores in their proper perspective. It ad\ isos the 
colleges that the tests should 11 at Ik* overemphasized 
and that the list results should lot constitute the sole 
basis for evaluating the pro ha He future success of a 
candidate, hut should lx* consic crod along with other 
relevant factors such as the school record, the recom- 
mendations of teachers, the record of s|vcial prizes and 
awards, extracurricular activiti'S. and frequently the 
obse rvations resulting from inte ‘views. 



7 



Psychometric considerations 

The psychometric problems of the Admissions Testing 
Program are similar to those found in the production 
and interpretation of any battery of tests having mul- 
tiple forms. The complicated nature of t tie program 
and t lie many populations and purposes it must serve, 
however, demand solutions that are somewhat out of 
the ordinary. The following paragraphs will briefly 
define the situation with respect to: 1) the validity of 
the tests, 2) their reliability, 3) parallelism among the 
forms of any given test, 4) the development and main- 
tenance of a score scale, 5) the development of norms, 
and 6) miscellaneous factors that must be taken into 
account hi interpreting scores. 



From the standpoint of many of the colleges that re- 
quire their candidates to take one or more of the tests, 
the main purpose of the tests is to help select a fresh- 
man class that is academically qualified. A sciond iry 
purpose, but one- that is increasingly important, is to 
help in the guidance and course placement of admitted 
freshmen. 

The validity of the tests for purposes of selection is 
indicated by answers to two kinds of questions: 1 i To 
what extent can the test scores increase the accuracy 
of predicting t he candidate's academic standing in 
college? 2 1 How well do the lest scores reflect the 
quality of the candidate's academic performance in 
secondary school? A third question bearing upon the 
validity of the tests is also occasionally asked, namely, 
how well do the lest scores help in fmvenst the quality 
of an individual's eventual contribution to society? 

The answers to all three questions determine tin* 
emphasis a college pl.t* vs on test .-cores in dividing 
whether or not to admit a particular candidate, in this 
connection much depends upon the admissions imbcy 
of t lie college, U| >ou its translation of policy into rational 
action, and u|kjm the quality of the candidates it 
attracts— all of which suggests that no single set of 
O] ierut i<nis can define the validity of the Admissions 
Tests for pur|N»ses of < .and id. tie selection. Some colleges 
are mainly interested in (he prediction rf academic 
standing Otluas may di>reganl prediction altogether 
and settle for docril ing the- candidate* not in term.*, of 
wlt.at he may Income but in terms of what he is. Still 
others may lie less concerned with the candidate's 
present academic status, or with his i banees of making 
gcxxl grades, than with idchtifving those qu ilities that 
arc presumed to lie basic to a satisfying and fruitful 
career in the |m>t -college world. The relative merits of 
these several approat lies to srlec tioii. or any c mill nia 
tions of them, arc not here under cli-c u-- inn. The 
validity question is to wh.it extent the Admi--imw 
Testing 1 Vogram c an | irovid-' imn-nro (hat are relevant 
to t hesc approaches. 



a O 




The validity of the tests for purposes of course place- 
ment lias received relatively little systematic study. 
There are, however, two basic models for such studies. 
In one case the scores on appropriate tests, taken at 
entrance to college, are correlated with final grades in 
particular freshman courses to see how accurately they 
predict performance in these courses. In the other case, 
the tests are given to enrolled students upon coni j dot ion 
of each of several courses in a sequence (for example, 
first setjicstei French, second semester French, third 
semester French, etc.) to determine, first, how closely 
the test scores approximate the level of performance in 
each course as measured by concurrent criteria such as 
final examination grades, and secondly, how well the 
tests measure the | regress of students from each course 
to the next above it. Here again, the validity of the 
tests for placement will vary from one type of course 
to another within a given college, as well as front college 
to college, in accordance with the way freshman in- 
struction in any given field may be organised and in 
accordance with the level and content of the subject 
mailer. 

Studies bearing on these questions are examined in 
detail in Chapter V, At this point, it is worth noting 
t hat the meaning of validity n ( lie context of the 
Admissions Testing Program is not confined solely to 
pmtidiie validity. Validity can mean an appropriate 
content balance ns wed. The undent of the College 
Hoard tests has from the very beginning been governed 
by committee's of examiners who are chose a from tin* 
faculties of secondary schools and colleges representing 
a wide gv*ographic distribution and a vnrictv of content 
interests and educational cmphises. These committees 
meet to decide on munlk-rs of each t y | k.* of item to be 
included in the lists, the distribution of item content, 
the- degree to win* h the items ri present various current 
< urrjciilar emphases, and the inture of the ration I or 
psychological processes required by the item. Care is 
taken, in their s| »e< ilirat ions for the test, not only to 
maintain an appro riate content balance, hut al.-n a 
balance in the didicully of the concepts lb.it are being 
examined. 



Ihh'nhilt'tij 

The reliability n! each form of each test in the Admis- 
sions 'listing Program is mul in ly e>ti mated after its 
ti r >t form d arlm nut ration, onlin.irily b\ means oft fie 
Kudcr-Kichardsi n formula 2 >. adap!cd lor u-e with 
formula H k'V *n n *. I\a« h mu Ii e-lim.ilc i- ordin- 
arily based on a s|K*cia1ly selei led sample of at lea-d 
P in i.ns when available, and >ouietiine> mimU ring 
as many a< rases. Frol i tfn* -t.md *oini of the 

te>t in r, however, the rcliabiily lorlfaicitt and error 
of mr i-urenu nt ialcd with any particular form 

of a tot in tin program tan be i >f little n< re than 
academic ndm-t. in a m!u lion where the spe» itic 
form of te-t a candid dr may take b r-sulialh un- 



predictable, the importin'! question is whether the 
general level of reliability is sufficiently high and 
reasonably uniform to permit the lest user to operate 
on the scores from any unspecified form with an even 
amount ot confidence. An subsequent tables in this 
book will show, the level c f reliability as estir dec! by 
K-H -20 across the active forms of any partk ular test 
is reasonably uniform. 



ParnUdisni aj fufois 

For all ihcir consistency, ' lie estimates of reliability do 
not, of course, provide suflicicnl information in them- 
selves concerning the an ount of error variance that 
must be aliened for in interpreting scores It is neces- 
sary to assume, in nddi ion, that the forms of any 
particular test are pnrall d in respect. Ijoth to content 
and ditliculty (Lord, HJn'lfi To the extent that this 
assumption is not met, the actual standard error of 
measurement for a scori irrespective of form will be 
slightly larger than that 1 sported for a score on any one 
form. The fact of the flatter is that the assumption of 
parallelism does not in al probability hold exactly from 
form to form, and the degree to which it does hold 
probably varies from one test in (he Admission - Vesting 
Program to another. '1 bus. because of the gradual 
changes that are intro I need into the Achievement 
Tests in response to idunges in i i.rrii uhun empaasis, 
successive forms of the Chemistry Tost, for example, 
cannot he regarded a^ having the same degree of 
parallelism as do successive forms of the s.vr. According - 
Jv, the difference between th ■ actual and the reported 
standard error of measi ivniciit for the (‘hemi-try Tc.-l 
is very likely to be slij fitly larger than the difference 
between tlie actual an 1 the reported standard errors 
of measurement for ei' 1 er scene on the sat. 

The etlort to ai hie\e para'icli-m among the forms 
centers on the definit ion of the tot s; niiticat ions and on 
the development of p>i forms that adhere in Ihe 
> I reifications. T he spc. ideations for any givoii lest in 
the program con.-i.-t of three principal clehu fits: 1 the 
distrihutirm of item ciflkulty, 2- the rli>l.*ibulion of 
item-test correl it ion-®, and perhaps most important. 
T the disl ribut ion of i ein i < intent . 

Inrh v nf in in difficulty and cmlfu icn(- « I ilcm-h -1 
correlation ai»- mill ine y | injured and \ iidil mfbrmal ion 
about paralleli- m in >i i ccs.-i\e form- of a te„-l in re-j«ei t 
to iheir overall diffiiulty and homogeneity. ffmuver, 
(/use sfat M res provide no inforniilini alwan the 
I urallelisin of item i ontent . T hi- aq * r( ni t he spreilu a- 
tions is necessarily less rigomns for two iv.ihui-. Kir-1, 
it ciej <eud- upon verbally d- fin' d vah-gorHs of fnpii - 
anrl proM '-e- that leave room for ambiguil ii - of 
ii.tei piet al ion on the mi l of the item writer-, N * iindh . 
with the at < unuilal h >ri of new data on tie ihanging 
nature of the rend dale |<npu].i| ion. the Maondary 
s< hoot t III l i< nlums. a id I li>' ini* IIm Inal ifelnalld - ol t lie 
lollegcs. tliere i- a u lit imious luvd to ihn ge (lie mil- 



o 




tent of the tests. In short, strict parallelism in test con- 
tenl, even if it were attainable, would tend over time to 
bring about a rvductit'n of validity. This problem is met 
by a conscious compromise? in the content s]x?cifiea(ion.s: 
changes in content r re introduced slowly so that the 
active forms in use over any five-year period are rip- 
proximatcly intercha igeable as far as content is con- 
cerned. The rate of change in the content of the sat is 
less rapid than that of most of the Achievement Tests. 
For this reason, although the sat itself is constantly 
evolving, the proves. 4 is sufficiently slow to permit it to 
Ik? regarded in a very i v *al sense as the principal stabilis- 
ing force in a ctiauging and developing Admissions 
'Testing Program as il develops over the years. 



Item analysis 

By means of a detui id program for pretesting and item 
analysis it has been j ossible to assess the difficulty and 
discriminating pow ?r of the items, to select items of 
appropriate statistical characteristics, to diagnos:* 
.source- of ambiguity and detect masons for failure t j 
provide adequate discrimination, and in genera! t> 
exert a significant r egree of control over the statistical 
properties of th-* tc t forms. 

Although tile an; lyses of pretest items yield a variety 
of information. 1 1 u principal statistics are the indexes 
of item ditliculty and item discrimination. T he index of 
item ditliculty. ref -rred to a- “'delta/' is s\ function c-f 
the percent passing th* 1 item. It is the normal deviate i f 
tin* point above which li< - the proportion of the ar<u 
under ’he curve equal to the proportion of corrvi t 
responses tr tin item. It is expressed in terms of a sc; le 
whose mean i- Pt and whose standard deviation is -t. 

Since prvtr-t -.aiiplcs. cspr« ijily tlio^e used for t ‘:e 
achievement prch.-ts. may differ in level nf ability fn m 
(he staudaid iifercnvc group (fiat was originally us d 
to seale tlu- te.-ts it i- neu-.-ary to render the raw. or 
oK-i/io/. delta- obtained from .-ihves,-ive pretestings 
comparable to c»i »• not h r by an equating procedure. 
TTiis procedure require.-; that (lie pretest material von- 
si-ting of i lew items U* administered together witl a 
nuinUr of previ »u-1y u-ril items whose stand, nd. * >r 
o;f orf-r/. della- indexes ot difficulty estimated ill terms 
of (fie performance of the standard reference popnl i- 
1 i< iti an* known. For e.e h such item an ob-< r\ed della 
i- i ah ol i ted 1 m- d mi the prvto.-t s onple. W'lii n I b 
new ob ot Nod i eh a \ allies are plotted igaili-1 die 
oqnaP d dolla \ dues on aril lunet ie grapli p.q>» r. he 
re.-till ing Mallet plot typically fall- in an elnngatid. 
narrow elltp-i* r: pre- i-pd b> - a liii^li correl.ilion m-vIU- 
froipionilv in th> a upper .t^Ts. The line didined iy 
t!n< pint, t.diolated (ruin I ho moan- and stand ml 
deviation- of th’ oh-oivod and equated della-, i- li-ed 
loioiivtrt l h»* item rliMu ultie.- for the pivti -1 .-.imp es 
i ih-er V ed ifrJla-- to ifei.j didkldli’S as ( *qjjndeiJ lor 
(lie stand ird r rl.*reiue ]«*ipulalion iquiPd licit i- . 
Allhongli the i»k-‘. r\od deltas l.uk t >niipar.ibilit \ - 







since they are dependent on the abilities of the various 
groups to whom the items have been administered — the 
equated deltas are all defined in terms of the same 
standard refer;, me groTip and are therefore directly 
comparable (Thurstone, 1947). 

In the assembly of eaeh new form of a test, rare is 
taken tc >ring the mean and standard deviation of the 
equated deltas as elose as possible to those of previous 
forms— except, of course, when the population shifts 
and it is dee med necessary to adjust the difficulty of the 
new form to make it appropriate to the new population. 

A second statistic that is regularly calculated for 
every item is the biserial coefficient of correlation be- 
tween the item and an appropriate criterion, usually 
the total score on the test. These bisorial correlations 
are used for three pur pores: first, to flag items that may 
be ambiguous; second, to assess the worth of item:, as 
discriminators; and third, to provide a basis for check- 
ing on the degree of homogeneity among the items of 
the test. It is this last consideration that has a bearing 
oil the parcllelism among test forms. Since a certain 
degree of heterogeneity is regarded as desirable in any 
test in the program, and is in part predetermined by 
the content sjxeifuations of the test, soir items with 
relatively low biseriols as well as some items with 
relatively high biseriats may lx 1 included whenever a 
new form is being pul together. Consequently, the 
degree to which the meins and standard deviations of 
the biscrials out form* agree becomes an 'nportmU 
consideration in estimating the parallelism among the 
several forms. 

The data routinely produced in the hem analysis 
yield additional information regarding the Ixhavior of 
the item, and are particularly useful in making revi- 
sions. These include tie mimt’er of people choosing 
each option of the item, t\? jxrcent passing the item, 
the numlxr of ixoplc omitting the item, the number 
not reaching the item (presumably Ixvause of insuffi- 
cient lime*, the mean total score on the lest for those 
choosing each option < is well as for those omitting ii 
and for those not rea hijig it g and the numlxr of people 
in each of five ability groups 'as defined bv the total 
score on the toll choking each option. From these 
freqioruies and means it is )>ossib!e to determine, for 
example, whi ther the i nn is appropriately diflkult for 
the group taking ii aid whether it is appropriately 
placed in the sequ< iki of items in the test. It is also 
possible to determine wlr thcr an incurred option is 
sulluiently attractive to Ibe h>s able examinees t»> lx* 
helpful, and whether it draws so many of the more able 
examinees lliat it may indicate the presemv of an 
ambiguity. This information also makes it |n»»i1ile to 
deteriuine ju>1 where mi the continuum of ability ibe 
item is making it< maximum di-i riininat ion and, in ail 
apj loxim ile wn>. whether the eharat ti ristie curve fur 
oath of the item option, im hiding the lorreit opt ion, 
i< nionotonic as it should he. 

$ 




T csf analysis 

After the first formal administration of each test form 
a socially selected sample of answer sheets (at least 
900, as was |>ointed out earlier, and sometimes as mar.; 
as 2,000) is assembled and a report of a detailed analysis 
of the test is made. Reliabilities and standard errors of 
measurement of the separately timed parts of the test 
are calculated, as well as intercorrelations among the 
parts and the total score. Assessments of speeded ness 
in terms of percent completing the test are maae, and 
distributions of the total formula score are presented, 
net only for t lie special sample but for all students tak- 
ing that form. In addition, distributions, moans, and 
standard deviations .are given of the number of items 
answered correctly, the number answered incorrectly, 
the number omitted, and the number not reached. As 
an additional cheek on speededness, a bivariate plot is 
gi\en of score versus number of items attempted. 
Finally, distributions, moans t and standard deviations 
are given of item deltas and bisorial correlations for 
each separately timed section of the test. All of these 
data are summarized and evaluated in an introductory 
text to cacli test analysis, which is then used :n guiding 
the development of future forms of i test. 



77a score scale 

The results of all the tests in the Admissions Testing 
Program are reported on a store scale running from 
2<K> to £00. The nature of the program impels rn this 
score scale a number of requirements, most ot which, 
at t he present state of the art, can be met only approx- 
imately. These requirements are as follows: 

1. The ntmdx-r designating any position on the scale 
must represent the same level of competence for a iv 
form of a given test. Tlnis. a score of 503 on the verbal 
m i tiojis of the sat should jopros-ent the same degree of 
roin|K.tine in the fiuut'oiw measured by the test re- 
gardless of whether the candidate takes Form A or 
Form 1) or Form (» of that test. If. for instance, the 
mean difficulty of the item in Form A hapjxns to lx* 
soinowh.it greater than 1 he mean difficulty of the items 
in Form 1). this difference in average item difficulty 
should have no died whatever on the sealed score a 
candidate is likely to receive. Thus, no candidate should 
I* put at a sjxi ial advantage or disadvantage i lx*cause 
of tin* fortuitous administration of an easy or difficult i 
ted form. Tin* probability of his obtaining a staled 
score of aid on Form A should lx* the* same as Ihe 
probability of his obtaining a scaled store of old on 
Form I) or any other current fei m of the It st 

2. The numlxr designating any position on tin* scale 
Intel represent the same level of coup xtciuv for any 
individual or group of individuals taking the test. The 
same conversion from raw to scaled scores on a form 
of a test is UM'd. whether the candidate is a male or a 



female, whether lie comes from the North or the South, 
whether he comes from a public, independent, or c hurch- 
related school, or whether he plans to major in fine arts 
or engineering. The scaled store is simply a reflection 
of his raw score -that is, tha number of items he 
answered correctly minus a fraction of the number he 
answered incorrectly. It does not reflect the background 
characteristics of the candidate except insofar as they 
may have affected his performance on the test and 
caused him directly or indirectly to earn a higher or 
lower score on the test. 

Similarly, so long as we are o) derating with a system 
of parallel forms, the conversion froi 1 raw to scaled 
scores is independent of the background characteristics 
of the samples of individuals, indeed, indepordent of 
the specific ability characteristics of the samples used 
in deriving the conversion. Within broad limits the 
same conversion equation relating raw to scaled scons 
would result whether the individuals in the samples arc 
male or female, adequately or inadequately trained, 
homogeneous or dis|>ersed. On the other hand, when we 
are dealing with tests of dissimilar function, say French 
and Spanish, the conversion equation depends quite 
crucially on the characteristics of the samples. 

3. The number designating any position on the scale 
must represent the* same level of competence for any 
test administration. That is, a scor** of 02 < on the 
Chemistry 'lest should represent the same degree of 
measured competence in chemistry whether the test is 
taken in December or March or July. The corollary of 
this is that if a candidate takes a given lest twice, and 
if there lui.- been any men sin able change in his com* 
tetencc between the? two testings. I lie difference be- 
tween his two scaled scores oil the test should, within 
the limits of sampling error, reflect that change, even 
if he has taken a different form of the test cm lire two 
occasions. 

■1. 'FI ic numlier designating any position on the scale 
for one lest o g., Chemistry ► is cotn/hiruhlc to t I k* same 
number on the scale for another test e . g . , French t in 
the sense that the* two nuinlnrs represent the sain.' 
relative jKisitions in the same reference populat ior. 
Thus, a score of 563 oil t! c Chemistry Tot and a score 
of 563 on the French Test both represent perfnrm.nvvs 
0.63 standard deviations above the mean of the same 
reference group, assuming that all iiiemlvr< of tint 
group have h,ul adequate pn-par.il i in for the tests in 
belli Chemistry and French. 

One of the most persistent ini<mru options of the 
2r 0-500 scale 1 UM?d for rc|Kirling the results of College 
Hoard tests is that it is a standard score scale on \\ hi h 
5fO represents the mean score of all College Moaid can- 
didates and IOO represents the standard deviation of 
H eir scores. Another misconception is f hat 500 and 100 
represent the mean and standard deviation of II 
college freshmen. Still another b that -V>0 and lot 
represent the mean and standard deviation of all 
secondary school seniors. Flic* fact i< that the tiumrs rs 



o 




500 and 100 simply refer to the mean and standard 
deviation of the group of 10,654 candidates who hap- 
pened to assemble to take the tests in April 1941 and 
who were used to define the scale system for the College 
Hoard. Since that group is of interest only from a his- 
torical point of view — in the sense that it simply marks 
the origin of th rt present College Board scale— and has 
no normative significance or usefulness in interpreting 
College Board scores, the 500-point similarly has no 
special significance other than that it is midway between 
the end points 200 and 800 The real significance of 
College Board scores lies in e continuity of the scales 
in the face of changing test forms and changing popula- 
tions and the fact that the scores have different norma- 
tive or evaluative meanings depending on the choice of 
the normative group. The scaled score of 200 does not 
stand for a raw score of zero; it is simply the lowest 
score reported. The scaled score of 800 does not stand 
for a perfect raw score; it is the highest score reported. 
This is not to fay that a score of 200 and a score of 800 
will Ik* possible on every torn of every test. Some t?sts 
or test forms will be relatively difficult, with the result 
that the minimum possible score on the test will be 
above 200, and scores of 200 will not bo possible. 
Similarly, sotve tests or test forms will be relatively 
easy, with the result that the maximum possible score 
on the Usl will he below 800, and scores of 800 will not 
be po> ible. Some of this variation is tlv natural conse- 
quence of random fluctuation in the construction of 
alternate forms. Some of it, however (especially the 
failure to reac i 200 j, is the natural consequence of con- 
structing tests that are especially designed to be ap- 
propriate to highly select and able groups of candidates 
— for example, those who elect to taso the Physics Test 
or the Mathematics T^evol 1 \ Test. 

Unlike ollu r scales in common use that have built-in 
normative meaning, there is no inherent meaning 
claimed for the College Hoard scale system. When nor- 
mative* inicrprc* alions are required* test users are urged 
In collect their own hu-al norms. In some instances 
special norms studies are conducted by the Hoard. The 
meaning of the College Hoard scale is the meaning that 
is given to it by the lest user himself. As he uses the 
scores, he comes over tin* course of time to attach cer- 
tain known levels of eoinpetenee to the scores on the 
so.de. That is to say, he learns to underhand and appre- 
ciate the "meaning" ‘ of a store of 563 in the same serine 
1 li.it lie has earned to underhand and appreciate the 
’'meaning'* of 1 I imhes, a process whirl i is I'ossihle only 
if the units ri-i lain constant. It is for this reason that 
tile College Hoard devotes its ctlort to til** job <»| pre- 
serving the coiidanc v of that meaning, and il does so 
by an inlri«M tr system of forinlod'orin equaling. That 
system will Ik* descrilvd ill the next two chapters. 

lit ii 

The fun* lion of norms fur college admission tests is to 
provide high -k IiomI guid mee counselors and college 



< ' r 
U 



n 



